Skip to content

Conversation

andypols
Copy link
Contributor

Summary

This PR fixes a bug where pushes to whitelisted repositories were rejected because the entered repository name didn’t exactly match the actual repository URL.

It’s related to the broader feature in PR #1043 (support for non-GitLab repos), but has been separated into a smaller, focused fix to unblock the security team, who are currently running penetration tests using GitLab. It also improves efficiency by loading a repository directly from the database instead of scanning all repositories on each commit.

Root Cause

The code previously assumed that repository names followed a strict ${project}/${name} naming convention (e.g., finos/git-proxy). However, this assumption doesn't hold in our case. For example:

  • Project: Security & Compliance
  • Name: GitProxy
  • Actual repo URL: security-and-compliance/git-proxy

Since the naming did not match the expected format, the push failed.

Fix

This PR removes the naming convention assumption and instead matches repositories based on their full URL. Specifically:

  • Introduces getRepoByUrl (with unit tests) for both file-based and MongoDB-backed repo sources.
  • Simplifies checkRepoInAuthorisedList by directly loading a matching repo via its URL instead of looping through all repos.
  • Updates the checkRepoInAuthorisedList test. I also moved it to be with the other processor action tests.
  • Adds support for handling mistakenly entered .git suffixes in repo URLs (which we also did by mistake).

Copy link

netlify bot commented Jul 17, 2025

Deploy Preview for endearing-brigadeiros-63f9d0 canceled.

Name Link
🔨 Latest commit 767ca64
🔍 Latest deploy log https://app.netlify.com/projects/endearing-brigadeiros-63f9d0/deploys/68c0e3e4c36c25000863efce

@github-actions github-actions bot added the fix label Jul 17, 2025
Copy link

codecov bot commented Jul 17, 2025

Codecov Report

✅ All modified and coverable lines are covered by tests.
✅ Project coverage is 83.44%. Comparing base (4a0fe55) to head (767ca64).
⚠️ Report is 19 commits behind head on main.

Additional details and impacted files
@@            Coverage Diff             @@
##             main    #1109      +/-   ##
==========================================
+ Coverage   83.22%   83.44%   +0.22%     
==========================================
  Files          66       66              
  Lines        2795     2790       -5     
  Branches      332      332              
==========================================
+ Hits         2326     2328       +2     
+ Misses        423      413      -10     
- Partials       46       49       +3     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Copy link
Contributor

@kriswest kriswest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I could do with some clarification of why the repository name didn't exactly match. Until #1043 is merged, git-proxy can't support multiple forks of the same repo (project or organisation name is different, repo name the same). This PR addresses one or two of those but, by including only a subset of the changes from #1043 it doesn't give you the ability work with multiple forks through the API or UI.

If the project is a going to accept #1043 soon (and on the v1 line - as seemed to be the outcome from the last meeting) then I would question if this this should go in first.

@andypols is any part of this PR NOT included in #1043 that I should be considering adding to it?

@andypols
Copy link
Contributor Author

I could do with some clarification of why the repository name didn't exactly match. Until #1043 is merged, git-proxy can't support multiple forks of the same repo (project or organisation name is different, repo name the same). This PR addresses one or two of those but, by including only a subset of the changes from #1043 it doesn't give you the ability work with multiple forks through the API or UI.

The discrepancy here is due to testing against GitLab, which allows arbitrary display names for repos. For example, I have a repo named Git Proxy Infra, but the actual repo URL in the git push is git-proxy-infra. So the display name and the slug used in the push don't exactly match — which causes issues with how git-proxy currently resolves repositories. People tend to enter the name they see in the UI.

@kriswest
Copy link
Contributor

kriswest commented Jul 18, 2025

@andypols That makes sense. I believe that in #1043 we entirely remove the user of the repoName across the proxy, API and UI, other than for display purposes (as you do here in just the proxy using the same approach and some of the same code - but with more work on the tests).

TBH I thought about remove the project/user fields entirely as we retrieve the same info from the Github and Gitlab APIs... I was stalled in the end by the idea that you would want a manually entered one for standalone git repos or other SCM hosts we don't have a specific integration for + the fact someone could be relying on them in a plugin implementation. The project (organisation) field is easier to remove as thats a github/gitlab concept (and differs between them as Gitlab supports many levels of sub-orgs), which could come from the respective APIs (the field is returned but we don't currently need it, instead we just extract the URL to it in utils.tsx / fetchRemoteRepositoryData)

@andypols
Copy link
Contributor Author

@kriswest I agree about removing the project/name fields when adding a repository. All you need is the clone URL

@andypols
Copy link
Contributor Author

@andypols is any part of this PR NOT included in #1043 that I should be considering adding to it?

@kriswest no - I did to address this one issue and unblock our security team

@kriswest
Copy link
Contributor

@finos/git-proxy-maintainers @andypols and I have been chatting off to the side and essentially we have the same goal, we need these fixes to go in ASAP. He's raised this PR and #1110 in the hopes of speeding up resolution / adoption of some of the changes (+some work to improve the approach to building tests, particularly of the DB clients).

If #973 and #1043 go in soon, then we only need the test additions from these PRs and and I'll work (with @andypols) on getting them in. If there is likely to be an delay then #1109 and #1110 should be prioritised (and i'll have to maintain the others further). My preference is obviously for the former option to happen ASAP - but I don't want to unduly block another adopter if we can't get them merged ASAP.

@jescalada I'm conscious that I haven't looked at your PRs yet, but am happy to help in anyway that gets us to the end goal here ;-)

@jescalada
Copy link
Contributor

@kriswest We're waiting for a FINOS admin to help us unblock the PRs we talked about in the call. Maintainers cannot actually merge them directly through GitHub so we would need extended permissions or coordinate with an admin to get them in...

If it's not too much of a hassle to keep maintaining your PRs, I can take a look at #1109, #1110 and the other fix PRs by @andypols and make a patch release so their org can keep pentesting smoothly (which I believe is valuable to all of us).

As for my PRs, a quick look would be appreciated but they are pretty much ready to merge barring the permissions issue 🙂

@jescalada
Copy link
Contributor

jescalada commented Jul 19, 2025

I agree about removing the project/name fields when adding a repository. All you need is the clone URL

Seconding this for adding repos - it's simple and reliable. We can prepopulate the other values (repo name, organization, etc.) after fetching from the repo URL.

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looks good! Here are some tests that might help in spite of the full coverage, to prevent regressions and inconsistent behaviour between the two db methods.

andypols and others added 3 commits July 19, 2025 20:34
@andypols
Copy link
Contributor Author

andypols commented Jul 19, 2025

@jescalada – thanks for the test suggestions; I've added them. The one testing unknown URLs in mongo didn’t work at this level, though, since the unit test only checks queries against a mocked database. I’ve removed that one.

@kriswest
Copy link
Contributor

kriswest commented Aug 21, 2025

@andypols this can be rebased now - although the changes to how DBs are looked up in the DB by URL may have eliminated the need.

Copy link
Contributor

@kriswest kriswest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Rebase, check that its not reintroducing DB functions that have been removed on main and determine if still needed

@andypols
Copy link
Contributor Author

andypols commented Aug 22, 2025

@kriswest I think there is a bug in your implementation of getRepoByUrl

export const getRepoByUrl = async (repoUrl: string): Promise<Repo | null> => {
  const collection = await connect(collectionName);
  const doc = collection.findOne({ name: { $eq: repoUrl.toLowerCase() } });
  return doc ? toClass(doc, Repo.prototype) : null;
};

Shouldn't this match on by url and not name?

@andypols
Copy link
Contributor Author

andypols commented Aug 22, 2025

@kriswest does it run for you on mongo? - the checked in code on main is not working for me.

The collection functions are promise based and are all returning empty documents.

For example:

const doc = collection.findOne({ name: { $eq: name } });

should be:

const doc = await collection.findOne({ name: { $eq: name } });

@kriswest
Copy link
Contributor

@kriswest I think there is a bug in your implementation of getRepoByUrl

export const getRepoByUrl = async (repoUrl: string): Promise<Repo | null> => {
  const collection = await connect(collectionName);
  const doc = collection.findOne({ name: { $eq: repoUrl.toLowerCase() } });
  return doc ? toClass(doc, Repo.prototype) : null;
};

Shouldn't this match on by url and not name?

I'll look into this - there were so many rounds of dealing with conflicts on this PR that there may be a merge issue there.

@kriswest
Copy link
Contributor

@andypols nope looks like a straight-up mistake, I'm not setup to test against mongo on my machine and, although it looks like there was a start on creating a mongo testing setup in the repo it was incomplete the last time I checked. Thats clearly something we need to change @finos/git-proxy-maintainers.

I've raised a bug fix PR (#1167), but can't test it until I have time to get some setup done here. If you have a moment to check we could move that forward today.

# Conflicts:
#	src/db/file/index.ts
#	src/db/file/repo.ts
#	src/db/index.ts
#	src/db/mongo/index.ts
#	src/db/mongo/repo.ts
#	src/proxy/processors/push-action/checkRepoInAuthorisedList.ts
#	test/testCheckRepoInAuthList.test.js
@andypols
Copy link
Contributor Author

andypols commented Sep 3, 2025

@finos/git-proxy-maintainers : I have updated this, please can you review and merge if happy. All that is left is a slight cleanup of the checkRepoInAuthorisedList code and additional tests

Thanks

Copy link
Contributor

@jescalada jescalada left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM! Would appreciate a final check from @kriswest before merging in 🚀

@jescalada jescalada requested a review from kriswest September 5, 2025 06:16
Copy link
Contributor

@kriswest kriswest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've got a question about dropping the fuzz test and think there might be couple more tests that couple be ported to the mongo test. I also wonder if mocking the DB calls is the right approach - or perhaps I should say 'I wonder if its sufficient', as we then don't test the construction of the DB queries and their behaviour when run through the database dependencies.

Other than those thoughts, I see nothing to object to and am of course happy with additional test coverage

@kriswest
Copy link
Contributor

kriswest commented Sep 5, 2025

The PR title/description could/should be updated before merge as the changes are now just the test additions (will make a better changelog entry).

@kriswest kriswest self-requested a review September 5, 2025 10:24
@andypols
Copy link
Contributor Author

andypols commented Sep 5, 2025

@kriswest and @jescalada I thoughts these may be useful. Feel free to close the PR as the important stuff was added in #1043. I don't have the bandwidth to rename the PR and make the changes at the moment.

@kriswest kriswest changed the title fix: checkRepoInAuthorisedList push failure on URL mismatch test: improve repo DB tests and CheckRepoInAuthList tests Sep 8, 2025
Copy link
Contributor

@kriswest kriswest left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Updated the title (as this now only relates to testing) and resolved a couple of the comments. I think its good to go - there's more that could be done, but we can take whats here now.

andypols added a commit to andypols/git-proxy that referenced this pull request Sep 9, 2025
@jescalada jescalada merged commit 6c516a3 into finos:main Sep 10, 2025
14 checks passed
@andypols andypols deleted the add-get-repo-by-url branch September 11, 2025 13:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants